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Abstract 

We consider the problem of approximating a smooth function from finitely-many pointwise 
samples using minimization techniques. In the first part of this paper, we introduce an 
inhnite-dimensional approach to this problem. Three advantages of this approach are as follows. 
First, it provides interpolatory approximations in the absence of noise. Second, it does not 
require a priori bounds on the expansion tail in order to be implemented. In particular, the 
truncation strategy we introduce as part of this framework is independent of the function being 
approximated, provided the function has sufficient regularity. Third, it allows one to explain 
the key role weights play in the minimization; namely, that of regularizing the problem and 
removing aliasing phenomena. In the second part of this paper we present a worst-case error 
analysis for this approach. We provide a general recipe for analyzing this technique for arbitrary 
deterministic sets of points. Finally, we use this tool to show that weighted minimization 
with Jacobi polynomials leads to an optimal method for approximating smooth, one-dimensional 
functions from scattered data. 


1 Introduction 

Many problems in science and engineering require the approximation of a smooth function from 
a finite set of pointwise samples. Although a classical problem in approximation theory, in the 
last several years there has been an increasing focus on the use of convex optimization techniques 
for this task [181 1201 [221 ISSl [28l [29l [301 EH ES]- This is driven in part by applications such as 
uncertainty quantification, wherein the dimension is typically high and the amount of data severely 
limited. As dimension increases, smooth multivariate functions are increasingly well-represented 
by their best /c-term approximation in certain orthogonal expansions (e.g. multivariate Legendre 
polynomials). Hence the expectation is that these techniques will yield improvements over more 
standard approaches - such as discrete least squares and interpolation ~ at least when the dimension 
is sufficiently high and the data points arise from appropriate sampling distributions. A number of 
recent studies, such as those listed above, have shown this to be the case. 

1.1 Current approaches 

Let / be a function in L^(D), where D C be an orthonormal basis and write / = 

where x = {xjjieN £ ^^(N) is the infinite vector of coefficients of /. If {tn]n=i ^ finite 
set of points, the problem is to approximate x, and therefore /, from the data {f{tn)}n=i- 
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Since x is an infinite vector, in order to compute an approximation to / it is necessary to 
truncate in some way. In the usual formulation (see [IBl [20l \22l ESJ EEl ESJ [30l [Ml ES]), one 
introduces a fixed M > N and seeks to approximate the first M coefficients xi,... ,xm of x. If 
A = {(l>i{tn)}n^i=i £ then the standard (weighted) minimization problem is as follows: 

min ll^llgu; subject to \\Az -y\\<6, y = {f{tn)}n=i- (1-1) 


Here ||2:||i,to = YliLiWi\zi\ is the on with weights Wi > 0. The parameter 6 handles 

the truncation, and is normally chosen so that {xi]fL-^ is feasible for dni). That is, 


max 

n=l,...,N 


M 

fi^n) 

i=l 


< s. 


( 1 . 2 ) 


In other words, the error introduced by truncating the infinite expansion to a vector of length M 
is viewed as noise in the data. 

Unfortunately, this formulation raises a number of issues, which we describe next. Overcoming 
these issues is the goal of this paper. 


(i) In order to choose 5, one must have an a priori estimate for the truncation error |/(t) — 
'^iLiXi(j)i(t)\. Note that the approximation error resulting from (II.ip is sensitive to the choice 
of 5 [35] • In practice, cross-validation techniques have been proposed to empirically determine 
the truncation error [I81I201135]. Yet such techniques are time-consuming, largely lack theoretical 
support and may not always result in an accurate estimation. 

(ii) The approximation f of f obtained from ()1.2p does not interpolate the data. In the absence of 
noise, interpolatory solutions are often desirable in applications since they ensure that the approx¬ 
imation exactly fits the underlying function / at the points at which / is known. 

(hi) The approximation / can be dependent on the choice of weights, and is prone to aliasing (also 
known as overfitting) if the weights are chosen inappropriately [SOj . 

(iv) Besides some specific cases where such techniques are known to perform extremely well - such as 
when the coefficients {xi}^^ are sparse and the data points {tn\n=i 8’'^® chosen randomly according 
to the orthogonality measure of the basis [D EQl ESI ESI EQ] - very little is known about 

the approximation error \f{t) — f{t)\. In particular, for general / (not necessarily having sparse 
coefficients) and arbitrary deterministic scattered points the quality of the approximation 

/ to / is largely unknown. 


Issue (iv) has ramifications for a variety of applications where the primary limitation is the avail¬ 
ability of data - that is, where it is time-consuming or expensive to acquire more samples - as 
opposed to data-rich scenarios where processing speed is the key concern (in which case classical 
techniques such as least-squares fitting are likely superior). If weighted minimization techniques 
are to find wide use in practice, then it is beneficial to have error bounds for both ideal (i.e. random 
sample points) and non-ideal conditions (i.e. fixed, deterministic sample points). 


1.2 Our contributions 

The purpose of this paper is to address these issues. In ^we first propose an infinite-dimensional 
weighted minimization problem which removes the need for a priori knowledge of magnitude of 
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the expansion tail. In the absence of noise, its solutions are exactly interpolatory, unlike solutions 
of (jl.lj) . As one might expect, however, such an infinite-dimensional minimization problem cannot 
be solved numerically. Hence we next introduce a truncation strategy based on a user-controlled 
parameter A G N. This leads to hnite-dimensional minimization problem over C^, reminiscent 
of (II.ip but with a number of key differences. First, unlike (11.11) . it requires no knowledge of 
the expansion tail, and second, it retains the interpolatory property of the infinite-dimensional 
problem. In ^we show how to select the parameter A in a manner independent of /, whenever / 
has sufficient regularity, and dependent only on the basis and data points {A}^=i- 

Formulating the minimization problem in an inhnite-dimensional setting also allows us to ad¬ 
dress issue (iii). In 0we hrst show that unweighted minimization is largely unsuitable for the 
function interpolation from scattered, deterministic data, since it leads to an aliasing phenomenon. 
Specifically, without weights it is possible for the optimization problem to have infinitely-many 
solutions which interpolate / at the data points, but which do not approximate / to any accuracy 
away from these points. Fortunately, this problem can be completely resolved by the introduction 
of slowly-growing weights. In effect, these weights regularize the optimization problem and ensure 
that such bad solutions of the unweighted problem, while still feasible, are no longer minimizers 
of the weighted problem. Through subsequent analysis we quantify how fast the weights need to 
grow to resolve this phenomenon, and demonstrate this result with numerical examples. 

Issues (i)-(iii) are the focus of the first half of this paper (^SHS]). In the second half, we consider 
(iv). More precisely, we pose and answer the following two questions: 

(a) How well can one approximate a function / using weighted minimization from its samples 
taken on an arbitrary deterministic grid of N points? 

(b) How does this approximation perform in comparison to other techniques, such as least squares? 

Note that we do not assume any sparsity of the coefficients x = of /, although we do 

assume some mild decay of Xj as i —)• oo (otherwise the weighted problem does not make sense). 
We also do not assume any structure to the data: the points {tn}n=i are deterministic and can be 
arbitrarily distributed in the domain. As is standard in scattered data approximation, we classify 
the error in terms of its density (or hll distance) [32]. 

Our motivation for examining (a) and (b) is the following. Least-squares fitting is a classical and 
widely-used technique, but is well known to be intensive in the number of samples required to achieve 
stability and accuracy (see miissiiM] and references therein). Conversely, under certain conditions 
(sparsity and random sampling) techniques are known to give very good approximations from 
relatively few samples. However, in certain practical scenarios - such as when using legacy data - 
one may not have the luxury to choose the data points in a way to deliver the best accuracy of the 
approximation. Moreover, while functions in high dimensions tend to have sparse coefficients in 
polynomial bases |15 [ I16 [ flH]. in low (in particular, one) dimensions polynomial coefficients usually 
exhibit rapid decay, but typically little sparsity. Since ^^-based techniques are computationally more 
intensive than classical methods such as least-squares htting, this raises the following question: is it 
still worth using techniques even when the data points are scattered and sparsity is not assured? 

In ^we present a general mathematical framework for answering these questions. We introduce 
a linear approximation error analysis for the inhnite-dimensional weighted ^^-minimization, which 
allows it to be compared directly to existing techniques. In particular, we reduce (b) to a question 
about the behaviour of three particular quantities that depend on the data points {tn}n=ij the 
expansion basis and the weights Analyzing these quantities for each specihc 

problem setup provides an answer to (b). 
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To illustrate the various aspects of this framework, in the final part of this paper and ^ 
we consider several examples, including one-dimensional Jacobi polynomial approximations from 
scattered data points. In this case, we prove the following: 

Theorem 1.1. For a,/3 > —1 let be the orthonormal Jacobi polynomial basis 112. 6\) on 

[—1,1] and let T = {tn}n=i be a set of N scattered points in [—1,1]. Let h be the density of the 
points, defined by 112.^) . and suppose that the truncation parameter 

(1.3) 

for some r G N, where is the minimal separation between the points T. Fix weights w = {tCijigp} 
with Wi = for some 7 > max{l /2 — (?, 0 }, where q is as in {2.1^ , and let f = 

with X G ^^(N) for Wi = y/i{wi)^. Then given measurements y = {f{tn)}n=i one can compute, via 
weighted minimization with weights w, an approximation x to the coefficients x satisfying 

Ik - ^11 X Ik - Pmx\\i,w + Ik - Pkx\\i,w, (1-4) 

where Pkx = {xi ,..., xk,0, 0 ,. • •} and Pmx = {xi ,..., xm, 0 , 0 ,..provided 

h-^ >M^\ogM. (1.5) 

Moreover, the approximation f = exactly interpolates the data: f{tn) = f{tn), Vn. 

This theorem demonstrates the key aspects of this paper, (i): the truncation parameter K 
is determined independently of /, and its contribution to the overall error is clarified by (m. 
In particular, if the data is roughly equally-spaced, then = 0{1/N) and it suffices to take 
K = for any r > 0. (ii): the approximation / exactly interpolates / in the absence of 

noise. Note that noise can also be dealt with within our framework; we exclude it here for ease of 
presentation, (iii): one gets an explicit criterion for how to choose the weights, (iv): the estimate 
(11.41) for the approximation error depends only on the density h of the deterministic points T which 
can be arbitrarily distributed in the domain. 

As we discuss in ^ the estimates dn and (II. 5p demonstrate not just good performance of 
this approach for scattered data, but in fact near-optimal performance. As we explain, no stable 
method which is convergent as /i —)• 0 can exhibit an error bound depending on x — Pmx measured 
in some norm with M growing faster than as h —>• 00 . Our numerical results support this 

conclusion, and in fact show that weighted minimization performs rather better in practice and 
similarly to an oracle least-squares fit. 

1.3 Relation to previous work 

A theory for reconstruction of sparse polynomials from random pointwise samples was developed 
in a series of papers by Rauhut & Ward [291 [30]. Extension and application of this work in 
uncertainty quantihcation has been considered in [H[TU[Ill[2ni|22l[25l[34l[35] . The use of weighted 
minimization was introduced in I25l|3ni[35]. We also use weighted minimization in this paper 
for approximation from determinstic samples, yet for rather different purposes. Namely, weights 
are chosen to regularize the minimization problem and remove the aliasing phenomenon. Typically, 
this requires only very slow growth of the weights, which we quantify in the paper. Unlike other 
works, we do not select weights based on a priori information about decay of the polynomial 
coefficients. In fact, in 0we will show that choosing weights in this way leads to inconsistent and 
often negligible improvements when the samples are scattered and deterministic. 
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The infinite-dimensional framework we introduce in this paper is inspired in part by the frame¬ 
work of infinite-dimensional compressed sensing in Hilbert spaces, due to A. C. Hansen and the 
present author (see also [5] for an overview). A key difference is the need for weighted 

minimization in the present setup, due to the lack of continuity of the sampling operator. We note 
also that our worst case analysis and comparison to least-squares fitting is similar to that presented 
in m for generalized sampling in the Hilbert space setting. 

The examples we use in this paper consist of algebraic and trigonometric polynomials respec¬ 
tively. Polynomial approximations (so-called polynomial chaos expansions) are popular in areas 
such as uncertainty quantification [2ll[33]. However, we stress that the framework and analysis of 
^2H5] of this paper is completely general, and can be applied to other bases. We mention several 
other examples in ^ Our examples are also one-dimensional. We do this so as to better elucidate 
the key ideas, without the notational complexities of the multivariate setting. 

On this topic, we wish to clarify that the aim of this paper is not to propose weighted 
minimization as a panacea for function approximation. In the one-dimensional setting especially 
there is a wealth of other techniques which are likely superior (see [UlTlElES] and references therein). 
The advantages of weighted minimization come to the fore as the dimension increases, as has been 
verified empirically in a number of works such as those mentioned previously. Instead, the purpose 
of this paper is to first propose a framework for weighted minimization that overcomes some 
existing issues, and second provide a more comprehensive analysis of its approximation capabilities 
for fixed samples. We use the one-dimensional case to this end primarily for illustrative purposes. 

2 Preliminaries 

Let D C be a domain and v{t) an integrable nonnegative weight function satisfying i>(t) df = 
1. Let L‘^{D) be the space of complex-valued weighted square-integrable functions on D with norm 
11-11^2 and inner product (•, •)x,2, and suppose that ^ n L°°{D) is a set of functions 

that are orthonormal with respect to v. Note that 

1 = II0 *IIl2 ^ s (2-1) 

-^1/ 

where IHI^oo is the uniform norm on D. 

2.1 Scattered data 

For e N, let T = {tn}^=i Q D he a. set of N scattered data points. Our aim is to approximate a 
function / : ZD —>• C from the values {f{tn)}n=i- To ensure an accurate approximation, we require 
a notion of closeness of the points T. We quantify this by defining the density 

/i = sup min \t — tn\, (2.2) 

ten n=i,...,N 

(also known as the fill distance [32]) where I'l is the Euclidean distance. In our analysis later, we 
will present convergence rates of the various approximations in terms of /i —>• 0. 

Associated to the points T will also be a set of values > 0, n = 1,..., which we refer to as 
quadrature weights. This is not to be confused with the optimization weights Wi introduced later. 
For simplicity, we define these as follows 

Tn= [ I^{t)dt, n = l,...,N, Vn = {t e D : \t-tn\ <\t-tm\, yrn ^ n} , (2.3) 

Jv„ 
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where Vn are the Voronoi cells of the points T in D. Given such quadrature weights, we dehne the 
following sesquilinear form on L^{D) n L°°{D): 

N 

{f,g)h = 

n=l 


and write H-H^ = ^(•, ■)h for the corresponding seminorm. Note that the quadrature weights 
are not strictly necessary at this stage, but will play a pivotal role later in the paper. 

2.2 Weighted spaces 

For the remainder of this paper, w = {rcjjjgN will be a set of positive weights satisfying 

Wi > \\c/)i\\oo > 1, Vi e N, (2.4) 

where the latter inequality is due to (|2.1I) . Define the weighted P spaces by 



VF = diag(t(;i,u; 2 ,...), (2.5) 

is the inhnite diagonal matrix of weights. For the remainder of this paper, we will assume that the 
function we wish recover / = ^ coefficients x = € •^i;(^^)- 

2.3 Other notation 

For A C N we let Pa : ^^(N) —£^(N) be the projection defined by 

(Pax)j = Xj, j G A, {PAx)j = 0, j i A. 

If A = {!,... ,iV} for some K G N, then we merely write Pk- We also let {ejjjg^ denote the 
canonical basis of ^^(N), so that 

Pa(-) = 

jeA 

We will allow the slight abuse of notation throughout the paper in thinking of Pax as both an 
element of .^^(N) and The intended meaning will be clear from the context. 

If X G C, we let sign(x) = x/|x| be its complex sign with the convention that sign(O) = 0. For 
X G ^°°(N) we let sign(x) = {sign(xj)}jgN G .^°°(N) be the corresponding sequence of complex signs 
of the entries of x. Finally, we use the notation a < 6 to mean that there exists a constant C 
independent of all relevant parameters such that a < Cb. 

2.4 Examples 

As mentioned in ^1.31 the examples we consider in this paper consist of one-dimensional functions 
on bounded intervals, which we take to be D = (—1,1) without loss of generality. In ^we briefly 
discuss extensions to higher dimensions, unbounded intervals and other approximation systems. 
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Example 2.1 If / is smooth, then it is natural to approximate it using a basis of orthogonal 
polynomials. Let 


z.(i) = - t)"(l +a,/3 > -1, 

be the Jacobi weight function, where ^— ^)“(1 + is a normalizing constant, 

jth orthogonal polynomial with respect to this weight function, and 


= K 


'-i-i 


- 1/2 


i-1’ 


( 2 . 6 ) 


be the corresponding orthonormal polynomial, where is as in (lA.ll) . One can show that 

ll^jll/^oo = O , j —>■ oo, where g = max{a,/3,—1/2}. (2.7) 


See Appendix [A] (several other properties of Jacobi polynomials that will be needed later are also 
listed therein). Since the weights {lUijigN introduced in ^2.2l are required to satisfy (12.4h . this means 
that for this example they must grow at least as fast as as j —)• oo. 

Example 2.2 Functions that are smooth and periodic can be efficiently approximated using trigono¬ 
metric polynomials. In this case, we have ^{t) = 1/2 and define {(/ijigN to be the Fourier basis 

0,(t)=eb-', jeZ. (2.8) 

For convenience we index over Z rather than N in this example. Note that ||</>j||oo = 1 and therefore 
the weights Wj in this example are required to satisfy Wj > 1, Vj G Z. 


3 Minimization problems 

Define the operator U : ^)y(N) —>■ by Ux = {^/^g{tn)}n=l where g = Note that this 

operator is bounded. We shall also view U G as the infinite matrix with entries 

Un,i = n = 1,... ,N, z G N. 

From now on, we make no distinction between the operator U and the infinite matrix. 

3.1 Infinite-dimensional weighted minimization 

Let / = Xicjii be a function we wish to recover, where x 

that we are given noiseless measurements of /, that is, f{tn), n 

2/ = {v^/(tn)}liGC^, 

be the vector of measurements normalized by the quadrature weights. To recover the infinite vector 
X of coefficients, and therefore /, we shall use weighted minimization. In order to avoid issues of 
truncation (recall ^l.ip . we first formulate the following infinite-dimensional optimization problem: 

inf llzlli u; subject to [/z = w. (3.1) 

ze£i,(N) 

If X G ^i,(N) is a minimizer of (j3.1|) . then the corresponding approximation / to / is given by 

f = ( 3 - 2 ) 

ieN 


= {xjjigN £ ^i;(N). Suppose first 
= 1 ,..., A^, and let 
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In general, the measurements may be noisy. Suppose we are given 


f{tn) + en, n = l,...,N, 

where \en\ < r], n = 1,..., N, for some known r] > 0. Write 

2/= {v^(/(^n) + en)}^=i £ (3.3) 

In this case, we solve the inequality-constrained optimization problem 

inf ||z||i ^ subject to lll/z — y|| < r/. (3.4) 

^6£i,(N) 

Note that (13.ip is just a special case of (13.41) corresponding to r/ = 0, and that both (13.1|) and (13.41) 
always have a solution, since the feasible set nonempty (specifically, x is always feasible). Note also 
that solutions of (j3.ip are interpolatory in the sense that f{tn) = n = 1,... ,N, whenever / is 

given by (13.2p with x being a minimizer of (|3.ip . Conversely, solutions of (13.41) yield approximations 
/ that are interpolatory up to the noise magnitude r]. 

Throughout, we shall assume that the noise bound t] is known. If rj is unknown, one may 
still solve the equality-constrained problem ()3.I[) in practice (or the inequality-constrained problem 
(13.41) with some estimate of the noise). However, there are no known recovery guarantees for this 
problem. See |191 dipt. II] for some work in this direction in the context of finite-dimensional 
compressed sensing with random Gaussian matrices. 

3.2 Truncation 

Unfortunately, neither problem (|3.1I) or (|3.4I) is numerically solvable, since they require optimizing 
over an inhnite-dimensional space. Let iL G N be a truncation parameter. To form a computable 
problem, we replace the space ^]„(N) with and truncate the N x oo matrix U to the N x K 
matrix UPk spanned by its first K columns. Hence, we now consider the problem 

min ||z||i subject to UPkz = y, (3.5) 

in the noiseless case, as well as its noisy analogue 

min subject to \\UPKZ — y\\ <r]. (3.6) 

zeC^ 

Both of these problems are finite dimensional, and can be solved using standard algorithms. If 
X G is a minimizer of either, then the approximation to / is given by 

K 

f = '^Xi(t)i. (3.7) 

i=l 

Note that neither (13.5p nor (13.61) modify the constraints of the inhnite-dimensional problems (13.ip 
and (13.41) . In particular, (j3.5p remains interpolatory and (13.4p is interpolatory up to the noise. 

With this in hand, the general idea is to choose K in such a way to ensure closeness of the 
solutions of the hnite-dimensional problems (j3.5p and (13.6p to those of the inhnite-dimensional 
problems (I3T]) and (|3.4p . In ^ we shall show that it is possible to choose iL in a function- 
independent manner, thus overcoming issue (i) of gm 



Remark 3.1 It is important that K be chosen suitably large. To see why, consider Example 12.11 
If it' = then (j3.5p has a unique solution and / is just the polynomial interpolant of / of degree 
— 1. However, for equispaced (or more generally, scattered) data this is well known to be a 
poor approximation to /, since it suffers from Runge’s phenomenon. The approximations / will 
generally diverge and the matrix UPn will have an exponentially-large condition number. On the 
other hand, if one replaces UPn by UPk with K > N, then provided K is sufficiently large the 
singular values of UPk are provably bounded away from zero (see 

3.3 Comparison to least-squares fitting 

Classical least-squares fitting corresponds to the approximation / = where x is the 

solution of the problem 

min ||CPm2 - y||- (3.8) 

An important difference between least-squares fitting and weighted minimization is the choice of 
the truncation parameters. In the former, the parameter M affects both the approximation error 
11/ — / II and the robustness of the approximation. In practice, M must be chosen suitably small in 
relation to 1/hto ensure a stability and robustness, while also being sufficiently large to give a good 
approximation. The issue of how to best choose M, which we discuss further in T5.4l for the specific 
case of polynomials, is nontrivial. While there are many known theoretical estimates for how M 
should scale for different function systems and datasets (see, for example, [3 El [H ini [231 [21] 
and references therein), optimal selection of M is difficult in practice. In particular, standard 
theoretical guarantees usually only determine the asymptotic order of M with 1/h. Constants, if 
known, tend to be overly pessimistic. This problem becomes more acute in multiple dimensions, 
since the ordering of the basis functions plays an increasingly important role. 

Conversely, the truncation parameter K in the weighted minimization formulations (13. and 
(|3.6p plays a completely different role: namely, it allows one to approximately compute solutions of 
the infinite-dimensional problems ()3.ip and ()3.4p . Once K is large enough so that the truncation 
error when passing to the finite-dimensional problems (13.51) and (13.61) is negligible, changing K 
has little effect on the accuracy of the solution /. Note also that (|3.5I) leads to interpolatory 
approximations, which is not the case for the least-squares fit (13.8p . 

4 The need for weights 

Before analyzing (13.ip and (13.6p in detail, we first examine the role weights play in the minimization. 
In particular, we shall show that it is in general necessary for the ratios 

'W^i/||/>i||ooOO, i^oo, (4.1) 

in order for the weighted minimization problems to give convergent approximations to / in the 
case of fixed, deterministic data. If this is not the case, then the minimization problem can have 
multiple solutions which aliase the data, leading in general to poor approximations. 

In ^ we shall prove that (14.11) is sufficient to guarantee a good approximation. To demonstrate 
its general necessity, we consider Example 12.21 Recall that ||</j||oo = 1 this case. 

Proposition 4.1. Let D, v and {/diez be as in 112.8\] . Let T = {tn}n=i be a set of N data points 
such that tnP G Z for some P G N and all n = 1,... ,N. Suppose that x G is a solution of 

inf llzlli subject to Uz = y, (4.2) 

z&P{Tj 
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Figure 1: Aliasing in minimization. The black line is the function /(t) = (/)o = 1, the red dots are the 
data points with = 11 (left) and = 21 (right) and the blue curve is the aliased solution (piQ (left) and 
020 (right). Although these solutions interpolate /, they do not approximate / in between the data points. 


where U = {0j(tn)}^=Tj=-oo V ^ ■ Then every shift of the entries of x by a multiple of 2P 

is also a solution of That is, for every k Gh, the element z € 1^{'L) given by 


Zi — Xi—2kPi 'I ^ Tj, 


(4.3) 


is a solution of 

Proof. Shifting the entries of x does not affect its norm, therefore ||2:||i = ||x||i. Moreover, 

{Uz)n = ^ ^ = {Ux)n = yn, n = 1 ,..., iV. 

j&z jez jez 

Hence z is feasible for ()4.2p . and therefore a minimizer. □ 

The absence of quadrature weights does not change the conclusion here, since we consider 
the equality-constrained minimization. We could also consider the inequality-constrained problem 
with much the same result, but we present the equality-constrained problem to show that the 
phenomenon is not due in any way to the increased size of the feasible set when r/ > 0. 

Taken on its own, the fact that (I4.2h has multiple solutions may not be alarming. After all, 
convex optimization problems often do. However, in this case the effect is catastrophic. Consider 
the problem where / = 0o = 1 so that its coefficients are x = cq. Since yn = f{tn) = 1 in this case, 
if z G i^CZ) is feasible for (I4.2p then 


1 = \{Uz)n 


E 

j£Z 


Zie 


ijirtr. 


< Iklll- 


Hence x itself is a solution of ()4.2I) . and by Proposition 14.11 so is every shift z = e 2 kP of x by a 
multiple of 2P. However, for all these solutions one has \\x — z\\ = 2. Thus, although there is one 
solution of (14.21) which recovers x (and therefore /) exactly there are also inhnitely many solutions 
of (14.2p that give meaningless approximations to x. 

This effect is due to aliasing the data by higher-frequency Fourier modes. The shifted solutions 
of Proposition 14.11 correspond to the functions (pkP, k G Z, which interpolate / at the data points 
but oscillate with frequency proportional to kP in between the data points (see Fig. [I]). Of course, 
in the simplified scenario described here the aliasing problem could have been avoided by solving 
a truncated problem with truncation K = P. However, as discussed, this will not work in the 
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Figure 2: Recovery of the function f{t) = cos(7ra::) exp(sin(7rx)) (shown in black) from N = 20 data points 
(shown in red). The blue curve is the function / obtained from (weighted) minimization using the Fourier 
basis with weights Wi = 1 (left), Wi = 1 + (middle) and = 1 + (right). Top row: equality- 

constrained minimization (ED. Bottom row: inequality-constrained minimization (13.41) with 77 = 10 ^. 


general case when truncation with K ^ N \s required in order to control the tail and ensure (in 
the noiseless case) an interpolatory solution. 

Now suppose that weights wi are added, and (14.211 is replaced by 

inf |U||i«) subject to C/z = y. (4.4) 

Assume the weights w satisfy W-i = rcj, i £ N and 1 < wq < wi < W 2 < ■ ■ ■, and consider the case 
oi f = 4>o once more. Then none of the aliased solutions of (14.2p are solutions of (|4.4I) . since they 
all have larger weighted £^-norm: ||e2fcp|| = W 2 kP > wq = ||eo||- Hence, adding growing weights 
regularizes the problem (14.411 and removes the bad, aliased solutions of (14.211 . This improvement is 
illustrated numerically in Fig. [2j This hgure also shows that this phenomenon is not limited to the 
equality-constrained minimization problem. 

Remark 4.2 The use of weighted minimization strategies has been occasionally motivated by the 
desire to match the decay of the true coefficients x of the unknown function and thereby obtain 
better approximations [251 [30]. However, this is not the primary role the weights in play in this 
setting. To demonstrate this point, in Fig. [3|we plot the error for weighted minimization using 
Chebyshev polynomials for a number of different test functions and weighting strategies. As can 
be seen, increasing the weights does not lead to a consistent improvement across all functions, even 
though all functions used (beside the hnal one) are inhnitely smooth and thus have coefficients 
which decay superalgebraically fast. While weights might help in some small way by promoting 
smoothness, these results suggest that the effect on the approximation error is much less than the 
role they play in regularizing the problem. Furthermore, higher weights may well cause problems 
for numerical solvers, due to the increasing ill-conditioning of the N x K system matrix UW~^Pk- 
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Figure 3: Weighted minimization with equispaced data and Chebyshev polynomials. The error ||/— /||l= 
against N is shown for the choice Wi = f, where 7 = 0.0,0.05,0.1,0.25,0.5,0.75,1.0,1.5,2.0,2.5 (thickest 
to thinnest). The truncation parameter K = AN was used. As with all numerical results in this paper, the 
minimization problem (13.51) was solved using the CVX optimization package. 


We note in passing that this situation is quite unlike the case of weighted minimization 
(see, for example, mm), in which case the error for a smooth function decays only algebraically 
fast at a rate dependent on the algebraic growth rate of the weights. Thus, for minimization, 
rapidly-growing weights promote smoothness. Conversely, we will prove later that for weighted 
minimization the error decays superalgebraically fast for all smooth functions whenever the weights 
meet a minimum growth condition (Theorem 17.ip . 


5 Approximation error of weighted minimization 

The remainder of this paper is devoted to the analysis of the problems ()8.5p and (I3.6p . In this sec¬ 
tion, we present a linear approximation error analysis. Truncation and the choice of the parameter 
K is addressed in ^ and in ^ and ^ we apply these results to the examples of ^2.41 

5.1 A general recovery result 

We first require the following result, which bounds the error of weighted minimization subject 
to the existence of a particular dual vector u: 

Lemma 5.1. Let A C {1, ... ,K}. Suppose that 

{i) : \\PaU*UPa - lAll < a, (n) ■ iTiax{\\Uei\\/wi} < /3, 

i^A 

and that there exists a vector u = W~^Lf*u' for some u' G such that 

{in) : ||VT(PA^^ - sign(PAa^))|| < 1-, {iv) : ||-Pa«||oo < (v) ■ ||u'|| < L^/s, 
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where s = , for constants 0 < a,9 < 1 and j3,'y, L > 0 satisfying 


VI + a^7 ^ 1 

(l-a)(l- 0 ) ^ • 

Let X be a minimizer of 113.6\) . Then, if x & feasible for i3.6\) . i.e. \\UPkx — y|| < ry, the 

error estimate 

\\x - x|| < 2 (Cl + C 2 L^/s) ry + C 2 (^2||P^a:||i,i„ + ||x - , (5.1) 

holds, where Ci = (l + Cq, C 2 = (l + ^) Co + ^ and Co = (l - (i^% )~' 

Recall that the problem (|3.5jl can be viewed as a special case of (j3.6p corresponding to the case 
ry = 0. Hence this result considers only (13.61) . The proof of this lemma is given in 1)5.51 

In practice, we shall use following result, which is a straightforward consequence of Lemma 15.II 

Lemma 5.2. Let A C {1 ,..., Lf}. Suppose that there are constants 0 < a, 0 < 1 such that 

(a) : IIFaC*17Fa - Ra|| < a, (b) : llF^W-^F*FFAA-^WFAsign(x)ll^ < 6, 

where A = FaU*UFa. If x is a minimizer of 113.6i) and s = ’ then 

||x - x|| < 2 C 3 ^1 + V + ^ 1^0 (2||-PAa;||i,t« + Ik - > (5-2) 

where x G is any feasible solution of Ii3.6\) and C 3 = 

Proof. We apply Lemma [5T] with u = W~^U*UPaA~^WPA sign{x). Note that (a) and (6) imply 
(i) and (iv) respectively. Also, by construction, Pau = PAsign(x) and therefore (in) holds with 
7 = 0. Now consider (ii). By definition 


\\Uei 


\h < 




N 


'^Tn< Wi, 


n=l 


where the last inequality is due to ( 12 .4p and the fact that J2n=i = 1 when the weights are 
given by (12.31) . Hence (n) holds with f3 = 1. Finally, observe that 


u'll = ||C/PAA-^WPAsign(x)|| < ||C/PA||p-kll^^’Asign(x)|| < 

1 — a 


where the final inequality follows from (a). Hence {v) holds with L 



□ 


5.2 Linear approximation error preliminaries 

Lemma (5.21 gives conditions under which x is approximated with error depending on the magnitude 
of its coefficients Xj outside some set A. This depends on the conditioning of the corresponding 
submatrix (condition (a)) and the off-support magnitude of the vector u (condition (b)). Under a 
sparsity condition on the coefficients x, and appropriate random choices of the points T, one may 
use this result to prove estimates relating the number of measurements to the sparsity [T]. However, 
as discussed in 21 practice the data points may not arise from such distributions. In this section. 
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we consider arbitrary deterministic scattered data points and present a linear error analysis. This 
follows by setting A = ,M} for some M < K and using Lemma 15.21 to determine how large 

M can be chosen in relation to the density h of the points. Doing this will allow us to make a 
direct comparison with other techniques (see Remark l5.9jl . 

Since such statements will be asymptotic in /i —)• 0, we first require an additional assump¬ 
tion. Let H he a subspace of L^{D) n L°°{D) which is closed under multiplication and complex 
conjugation and such that f ^ H and Q H. We now assume that the points T satisfy 

Tng{tn) f 9 {i)v{t) dt, h^O, yg eH. (5.3) 

Jd 

In particular, since H is closed under multiplication and complex conjugation, one has that 

{f^g)h ^ {f,g)Ll^ ^ 0 , \ff,gEH. 

Hence the discrete inner product is equivalent to (•, ■)]^2 on finite-dimensional subspaces of H for 
sufficiently small h. Note that this assumption is by no means stringent. For example, if D = (—1,1) 
we may take H to be the space of functions for which |/(t)pi/(t) is Riemann integrable. 

Before stating our main result (Theorem 15.6p , we first require some additional notation and 
technical lemmas. For h > 0 and M, i? G N we define the quantities 

E2{h,M) = \\Pm - PmU*UPmI E^{h,M) = \\Pm - PmU*UPm\U (5.4) 



and 

F{h,M,R) = \\P^W-^U*UPm\\oo- (5.5) 

We also set E{h,M) = max{£' 2 (/i, M), £'oo(/i, M)}. 

Lemma 5.3. For fixed M, we have E{h,M) —)• 0 as /i —)• 0. 

Proof. Since all norms on are equivalent, it suffices to show that {U*U)ij —>■ Sij for each 
i,j = 1,..., M as /i —)• 0. However, by (15.31) and orthogonality of the (fj, we have {U*U)ij = 
{4>i, 4>j)h {4’i, 4>j)Ll = as required. □ 

Lemma 5.4. Suppose that the weights wt = for some Zi > 1. Then 


F{h,M,R) < 


VM^1 + E{h,M) 


Proof. Let x G Pm{(^^{^)), ||a:^||oo = 1 be arbitrary. Then [W ^U*UPMx)i = {g,(f)i)hl Wi, where 
g = Y^^iXjfij. By the Cauchy-Schwarz inequality, 


P^W-^U*UPmx\\oo < sup 

i>R 


1 

Wi ' 





Now \\g\\l = {x,PmU*UPmx) < {l + E{h,M)) 
Wi/zi and therefore 

\\P^W-^U*UPm\\oc 


{xW^ < M (1 + E{h, M)). Also 
^ VM^/l + Eih,M) 

~ m.iiyR{zi} 




as required. 


□ 
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Lemma 5.5. Suppose that the weights Wi = where Zi>l and Zi ^ oo as i ^ oo. Then 

for any 0 < e < 1/2 and any M S N there exists a R G N and h > 0 depending only on M and e 
such that 


E{h,M) < e, 


E{h, R) < e 


m.mM<i<R{wi} 


F{h,M,R) < 


e 


(5.6) 


Proof. By Lemma 15.31 we can find an hi such that E{h,M) < e for all h < hi, thus satisfying the 
first condition in (|5.6I) . Using Lemma 15.41 we note that 


E[h,M,R) < 


vW 


\/h < hi- 


minj>/j{zj}’ 

To satisfy the third condition in (15.6p . we pick R sufficiently large so that 


infjzil > V2M max ^wA/e. 
i>R 


We now pick /12 sufficiently small so that 


E{h, R) < e 


m.UlM<i<R{Wi] 


V/l < /i2, 


and then set h = min{/ii, / 12 }. 


□ 


5.3 Main result 


In order to present our main result, we first introduce the following notation; 

Th,K,r](x) = inf {||x - : x G , \\UPkx - y\\ < r]} . (5.7) 

Theorem 5.6. Suppose that the weights wi = where Zi>l and z* —)• 00 as i ^ 00 . For 

0 < e < 1/2, let h > 0 and M,R G N be such that 115.6\} holds. Then there exists a constant C{€) 
such that, whenever x is a minimizer of 113.Oil . 


X — x\\ A C(e) 


(1 + llPM'li’ll) V + ||.fM2:|| + Th,K,riix) , 


(5.8) 


where Th^x,r]{x) is as in Moreover, lim,, ^ 0 + C{e) = 4. 

Note that the weights condition is equivalent to (j4.ip which was shown to be necessary in 0 
Theorem 15.61 shows that the same condition is also sufficient. We note also that the truncation 
parameter K only influences the term Th^x,r}ix). We defer the detailed analysis of this term to ^ 

Proof. We use Lemma (5^ with A = {1,...,M}. Note first that 


E{h, M) < e < 1, 

(5.9) 

and therefore (a) holds with a < e. Now consider (6). Write 


u = W-^U*UPMA-^WPMSigri{x), 

(5.10) 

so that (6) is equivalent to 00 < 0- We have 


11 11 00 ~ max 11 PjiPji^ 'a 11 00 ) 11 Pr 'a11 00 j" • 

(5.11) 
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We consider both terms separately. For \\PrP^u\\oo, (I5.10p gives 


\PRPhu\\oo < \\PRPtW-^\\oo\\PRPhU*UPM\\oo\\A-^\\ 
< \\PRPijU*UPM\\oo\\A-^\ 


vahiM<i<R{wi] 

Note that ||/ — ^||oo < E{h,M) and therefore ||A“^ 


< 


l^’MhF||ool|sign(x)||c 


Moreover 


l°o — l-E{h,M)' 

\\PRPitU*UPM\\oo = \\PRPii{I-U*U)PM\\oc < \\Pr{I-U*U)Pr\\^ < E{h,R), 
where the hrst equality is due to the fact that P^Pm = 0. Therefore, we obtain 

maxi<i<M{wi} ( E{h,R) 


\PrPmU 


R^MMoo ^ 


< 


< 


miiiM<i<R{wi} \1 - E{h,M) J 1 - e' 

Now consider the other term in ()5.11l) . By (I5.10p and the definition of F{h, M, R), 

'\Pru\\oo < \\PrW-^U*UPm\\oo\\A-^\\oo\\PmW\U < ' 


(5.12) 


R 


1 - E{h,M) 


1-e 


Combining this with p5.12p and substituting into (15.lip yields 0 < < 1. The result now follows 

immediately from Lemma 15.21 □ 


5.4 Comparison with least-squares fitting 

The following result is standard (a proof is given for completeness): 

Theorem 5.7. For 0<e<l let and h > 0 be such that E 2 {h, M) < e < 1, where E 2 {h, M) 

is as in Q. Then \3.^) has a unique solution x, and this satisfies 


x — X < 



X - Pmx\\i,u, + 



(5.13) 


for any w = {tcJieN "with wi > \\(l)i\\L^. 

Proof. Observe that 

\\UPMzf = z*PmU*UPmz = ||zf - . 2 * {Pm - PmU*UPm) z > (1 - E 2 {h, M)) ||zf. 

Hence UPm has full rank since E 2 {h, M) < e < 1, and its minimum singular value satisfies (Tmin > 
yjl — €. This implies that x is unique and is given by x = {UPM)^y- Hence 

X - Pmx = {UPm)^ {Ux + {^/^^en}n=l) - Pmx = (UPm)^ {U{x - Pmx) + {^/Tf^en}n=l) ■ 

Therefore 

||x - Pmx\\ < ^— (||C(x - Pmx)\\ + ??) < — (||x - Pmx\\i,w + v) ■ 

V1 — e ^/l — e 

Since ||x — x|| < ||x — Pmx\\ + ||x — Pmx\\ < \\x — Pmx\\i^w + ||® — Pmx\\, the result follows. □ 
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The error bound (I5.13P is similar to the bound (15.81) for weighted minimization. In the absence 
of noise and truncation error, both depend on the term x — Pmx, i.e. the tail of x beyond its first M 
coefficients. The primary difference is in the size of M, which is determined through the conditions 
of Theorems 15.61 and 15.71 respectively. We shall discuss this point further in ^ and ^ But we hrst 
reiterate (see also 1)3.31) that M is a fixed parameter for least squares, required for implementation, 
whereas for weighted it is introduced solely to provide an estimate for approximation error. 
Lemma ED asserts that weighted minimization can recover coefficients of x corresponding to 
other subsets A, provided the various conditions hold. Note also that for weighted minimization 
the parameter M appearing in Theorem 15.61 is in no way related to the truncation parameter K, 
besides the relation M < K. 

Remark 5.8 The error bound (15.131) also differs from (j5.8l) in that it holds for the weights Wi = 
||(^j||£,oo and its noise term does not involve the factor UPM'fi’ll- See ^for further discussion. We 
also remark that the term Hx — Pmx\\i^w in (|5.8I) can be improved slightly to ||/ — f\\L°°, where 
/ = projection of / onto span{(;f)i,... ,(Pm}- We opt for ()5.13p so that a direct 

comparison can be made with (15.8p . Note that one also has ||a;—< ||/~/|| < \\x-Pmx\\i. ,W • 

Remark 5.9 Theorems 15.61 and 15.71 provide a recipe for determining the worst-case behaviour 
of weighted minimization for scattered data. This is as follows. Given a orthonormal system 
{(/>i}igN, an /i > 0 and an 0 < e < 1/2, determine: 

1. the largest M = Mi{h) such that E 2 {h, M) < e, 

2. the largest M = M 2 {h) such that (15.61) holds for some appropriate R and {rcijigN- 

In this case, the errors for both weighted minimization and least-squares fitting are determined 
by ||x — Pmx\\i^w) where M = Mi{h) for the former and M = M 2 {h) for the latter. Hence, if 
Mi{h) x M 2 {h) as h —)• 0 it follows that both least-squares fitting and weighted minimization 
are guaranteed to converge at roughly the same asymptotic rate as /i —)• 0. Since Mi{h) and M 2 {h) 
are dependent on the system {(/)i}igN a separate analysis must be carried out in each case. The last 
two sections of this paper will be devoted to doing this for the examples of ^2.41 

5.5 Proof of Lemma 15.11 

To complete this section we give the proof of Lemma 15.11 

Proof of Lemma \5.1\ . Let v = x — x. Then Av = P^U*Uv — P^U*UP^v, where A is the restriction 
of PaU*UPa to Pa('^^(N)). By (i), we have ||^“^|| < and 

\\PAU*f = \\UPAf = \\PaU*UPa\\ <1 + 0. 

Thus 

^ + -^\\PAU*UPiv\\ < (||[/u|| + ||c/Piu||). 

Observe that 

\\Uv\\ = \\Ux-Ux\\<27]. (5.14) 

Hence 

||Pa'(^|| < {^'n + • 
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The second term can be estimated as follows; 


i^A 

where the latter inequality is due to (ii). Hence we get 

||-Pau|| < (2r] + /3||-PA^^IIi,t«) • (5-15) 

We shall return to this inequality later, but let us now consider x. 

> Re(PAlH;r,sign(PA®)) + ||-Pa^IIi,«; “ II^a 2 ;|Ii,«; 

= Re(PAfR^^,sign(PAx)) + UPaxIIi,^ + ll-fA^^lli,«, “ II^AS^IIg^, 

= Re(PAlR^^,sign(PAx)) + ||x||^^^ + ||Pa^^|Ii,^„ - 2||PAa;||^^. (5.16) 

Now let X G be any feasible solution for (13.61) . Then ||x||i_ij, < ||3:||i^^ and we get 

||S||i,t« > Re(PAlTu,sign(PAa;)) + ||x||i^^ + ||Pa^^|Ii,^ - 2\\P^x\\^^^. 

After rearranging this gives 

ll-PA^IIi,t« < |(PAlR^^,sign(PAx))| + 2||P^x||l,^„ + ||x - x||i,.u;. (5.17) 

We next estimate |(Pa1Ru, sign(PAx))|. We have 

|(PAlTu,sign(PAx))| < |(PAlRu,sign(PAx) - Pau)| + |(lTu,u)| + |(P^lTu,P^tt)|. 

Note that |(Pa1Tu, sign(PAx) — Pa«)| < tUPa'I’II by {iii) and also that {Wv,u) = {v,Wu) = 
{v,U*u') = {Uv,u'). Hence, ()5.14l) and (u) give 

|(lTu,n)| < ||Pu||Lv^ < 2r]LyG. 

Finally, by (iv), we have |(P^lTu, P^n)| < ||Pa'w||cxd||Pa^IIi,u^ ^ therefore 

|(PAlTu,sign(PAx))l < 711 -Pa^^11 + 2r]Ly/s + 6\\P^v\\i^y,. 

Substituting into (15.170 and rearranging now yields 

(1 - 9)\\Pav\\i,w < 7||Pau|| + 2riLy/s + 2||P^x||l,^„ + ||a; - x||l,^„, 


and applying (I5.15P gives 


II^A^^II < 


VTT 


a 


1 — a 


2r] + (7||Pa'(^|| + 2T]Ly/s + 2||P^x||l,^„ + ||x - 


Hence 


IIPa^^II < 1- 


+ 1 - 


VTTcr/Jy 

(l-«)(l-0) 

-v/1 + alS-yq \ ^ y/Tl^l3 




(1 _„)(!_9); i_„ (2||Pplli.»-Hlx-s|li,») 

= 2Co ^1 + — —Ly/^ rj + C*o/3 ^2||P^a;||ijUi + ~ a)||i,ui^ . 
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Since ||-P^f|| < (^'ecall (12.4p ). we now get 

ll^^ll <II-Pa^^|| + ll-PA'f^llg-u) 


< 


1 + 


r^) ^0 (i + 


(2r/ + 5) 


+ 


co/’|i + r^) + T^ 


(^2||P^x||l,^„ + ||x - , 


as required. 


□ 


6 Handling truncation: the choice of K 

We now consider the truncation parameter K. Due to Theorem 15.61 it suffices to estimate the 
quantity Th^K,rj{x) deffned in (15.71) . 

Theorem 6.1. For all sujficiently large K we have Ran([/) = Ran([/P/^). In particular, the 
problems i f 3. 51) and 13. 61] have solutions for all large K. Moreover, suppose that r = rank([/) < N 
and K is sufficiently large so that Ran(f7Pft') = Ran([/). If x ^ then 

Th,K,T){x) < ||x - Pkx\\i,w + \\Pkw\\/^ min 11 ^ Pkx\\i,w, 

where cTmin is the minimum singular value ofl/Px. Moreover, i/{rcjjieN is nondecreasing, and 
X G ^^(N), where w = {u)j}ieN with Wi > y/iwf, Vi G N, then 

'Fh,K,ri{x') ^ \\x Rff3^11 1,10 T 1/<3'minRff3^11 1 , 10 • 

Proof. The first observation is immediate since U has finite rank. Suppose now that K is such that 
Ran{UP k) = Ran(C/) and write x = Prx + {UPk)^U{x — Prx). Then 

\\UPkx - y\\ = \\UPkx + U{x - Pkx) -Ux- {v^en}^=i|| < p. 

Hence x is feasible. Moreover, \\x — x||i,^ < ||x — Pkx\\i^w + \\{UPk)^U{x — Pkx)\\i^w and 

\\{UPk)^U{x - Pkx)\\ 1^ < \\PKwf\\{UPK)^U{x - PKx)f 

< \\PKwf\\U{x - PKx)f 

< \\PKw\f\\x - PKxWl^/al^i^. 

To obtain the final result, we note that 

||Pii-u;||||x - PrxWi^w < WK'/K'^Wi\xi\ < ^ Wi\xi\ = ||3; - Rft:3;||i,«i, 

i>K i>K 

whenever the wfs are nondecreasing, as required. □ 

Note that it need not be the case that (13.5p or p.6p have solutions for arbitrary K > N, since 
Ran(17Px) 7^ Ran(17) in general. However, this theorem shows that this holds for all large K. 
Furthermore, this result shows that once K is chosen so that l/umin is moderate in size, the effect 
of truncation is bounded by the decay of the coefficients Xi, i > K. 
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Remark 6.2 Theorem 16.11 asserts that the truncation parameter K can be chosen independently 
of the coefficients whenever x € While it is possible to show that Th^K,r]{x) —)• 0 as iC —)• oo 

for X € O Prop. 6 . 6 ], it is currently unknown whether a bound for Th^K,ri{x) involving only 

||x — PrxWi^w holds. In other words, if x G but x ^ the truncation strategy may no 

longer be independent of x. Improving this result is an open problem. 


It is important to quantify precisely how large K needs to be in relation to N to ensure a small 
truncation error. For this, we shall use the following lemma: 


Lemma 6.3. The minimum singular value (Tmin oJUPr satisfies 

1 - lb - Hoc 


Umin > inf sup SUp 

g&Gy 

lly||=i <i>^o 


+ lb - He 


( 6 . 1 ) 


where (^k = span{(/)i,.. and Gy = {g e Ll{D) n L°°{D) : g{tn) = Vnl^/^, n = 1,... ,N} . 

Proof. The minimum singular value is given by cTmin = iufygc^ \\{UPK)*y\\. Fix y G C'^, |b|| = 1 


and observe that 


=1 


N 


\\{UPK)*y\\ = sup \{y,UPKz)\ = sup 

zeC^ 


Let g ^ Gy and G ^k- Then 


ll(i^^’i^)*y|| > 


N 


^ ^ \/Xnynfiifn 


n=l 




> 


.=1 


^ V^VnHtn) 


n=l 


N 


^ H^Vnaitn) - Htn) 


n=l 


> 


-lblllb-011 


L°° ■ 


Since \\Hu < Ibib + lb “ 


b< 


+ lb “ '/’Iloo the result now follows immediately. 


□ 


Remark 6.4 In practice, rather than performing an analysis of Umin via Lemma 16.31 one may 
choose K simply by calculating the minimal singular value of UPr- If this is sufficiently large, then 
Theorem 16.11 guarantees that the truncation error is moderate. 


6.1 Examples 

Lemma [6 .3 1 allows one to determine the required condition on K. Note that this depends completely 
on the points T and the basis We now do this for the two examples of 1)2.41 

Theorem 6.5. For let be the Jacobi polynomial basis 6\) and let T = {tn}n=i 

be an ordered set of points in [—1,1]. Then for every r G N there exists a Gr > 0 such that 

where fJmin is the minimal singular value ofUPR, 

1 min {tn+i-tn}, h= sup min \t-tn\, 

2 n=0,...,N n=l,...,N 

and to = —ti — 2, = 2 — In particular, if 

1 

K > (y2Gr{l + e"b) " 
for some 0 < e < 1 then Umin > 1 — e. 
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We defer the proof of this result until 1)7.21 Note that in the case of equispaced data, we have 
h = ^ = 1/N and so it suffices to take K > for any r G N. In practice, we have found 

that K = AN is sufficient in all examples (recall also Remark I6.4p . On the other hand, if the data 
clusters severely, then this theorem suggests that a larger value of K may be necessary. 


Theorem 6 . 6 . Let he the Fourier basis Ii2.8\) and let T = be a set of N ordered 

points in [—1,1]. Then for every r G N there exists a Cr > 0 such that 





where fJmin is the minimal singular value ofUPK, 


Z n=0,...,A' 


h = sup 

ri= 



and to = —A, tw+i = 1 - In particular, if 


1 


K > 



2Cr{l + e-^ 



for some 0 < e < 1 then 0 < 7 < e. 

This result is exactly the same as that for Jacobi polynomials fTheorem 16.51) . except up to a 
minor change in the definition of the values to aiid t^r+i, and therefore Its proof is near identical, 
and hence is omitted. 

7 Jacobi polynomials on the unit interval 

We now consider Example 12.11 For convenience, we recall the growth condition (12.7p : 



Our main result is the following: 

Theorem 7.1. For a,/3 > —1 let {4>i}i£n be the orthonormal Jacobi polynomial basis i2.6\) . T = 
{tn}n=i ^ ® set of scattered points and suppose that h is as in h2.2\) . Suppose that the 

weights Wi = for some 7 > 0. Then for 0 < e < 1/2 there exists a c(e) > 0 such that if 



0 < 7 < 1/2 — q 
7 > 1/2 - g 


(7.2) 


where q is given by any minimizer x of /IS.b]) satisfies 


x-x\\ < C{e) + ||x - Pmx\\i,w + TN,K,rj{x)) , 


for some constant C depending on e only, where TM,K,ri{x) is as in (|5. T) ). 


One also has a similar, albeit simpler, result for least squares: 
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Theorem 7.2. hei {(/)j}jgN be the orthonormal Jacohi polynomial basis 112. 6)) . T = {tn}n=i ^ [~lj 1] 
be a set of scattered points and suppose that h is as in i2.2\) . Then for each 0 < e < 1 there exists 
a c(e) > 0 such that if 

h < c{e)M-^, (7.3) 

then the solution x of 113.8\} exists uniquely and satisfies 


k — x|| < 1 + 


\/l — 


\x - Pmx\\i,w + 


VI — 


for any w = with Wi > WfiiWL^. 

Theorems rrn and E2] assert that both techniques guarantee an approximation error that de¬ 
pends on X — Pmx measured in some norm, for the same asymptotic scaling of h with M up to 
log factors. Hence, up to the choice of norm, weighted minimization with scattered data, Ja¬ 
cobi polynomials and sufficiently large weights Wi is guaranteed a similar approximation rate as 
least-squares fitting. 

It is informative to now consider the following two cases: 


Smooth functions. Let / G 1,1]). Then the coefficients Xi = O {i~^) as z —>• oo for any 

k > 0. Hence ||x — Pmx\\i^w = O (M“^) as M —)• oo for any k > 0 whenever the weights Wi grow 
at most algebraically fast in i. By Theorems 17.11 and 17.21 the approximation errors for weighted 
minimization and least squares both decay super algebraically fast in /i as /i —)• 0; that is, faster 
than h^ for any k > 0. 


Analytic functions. Suppose / is analytic so that Xi = O [p *) for some p > 1. For algebraic 
weights Wi one has ||x — Pmx\\i^w = O {{p')~^) as M —)• oo for any < p. Therefore the 
approximation error for least squares behaves like \\x — x|| = 0{{p')~^/^) as /i —)• 0, and for 
weighted minimization one has the marginally slower convergence ||x — x|| = 0{{p')~^^^^ 
provided the weights Wi = z'^||(/>i||L°° with ^ > \ j2 — q. 


Remark 7.3 On the other hand, for functions of finite smoothness, the need for more rapidly- 
growing weights in Theorem 17.11 translates into slower algebraic convergence of the approximation 
than that of least squares. We expect that this may be an artifact of the proof, however. 


Suppose now that the data T is equispaced. We first note the following general result: 

Theorem 7.4. Let T = {tn}n=i be an equispaced grid of N points in [—1, 1], E C C be a compact 
set containing [—1,1] in its interior and let B{E) be the Banach space of functions continuous on 
B and analytic in its interior, with norm ||/||_b = sup^g^ |/(z)|. Let F : B{E) —)• L°°(—1,1) be 
a mapping such that for each f G B{E), F{f) depends only on the data {/(tn)}^=i, o.nd suppose 
that, for constants C > 0, a > 1 and 1/2 < r < 1, 

11/ - F{f)\\L^ < Ca-^^\\f\\E, V/ G B{E). 


If ||/||t,oo = max„=i^...^7v |/(^n)| then there exists a constant u > I such that 

> V 


0(F) = sup <; 

feB{E) [ 
II/I|t.oo/o 


/ \\Hf)h°° [ ^ 1 


T,cx) 


(7.4) 
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This theorem is due to Platte, Trefethen &: Kuijlaars |26j (a minor modification is made in (17.41) 
which is more suitable for our purposes). It states the following: for any method that achieves an 
error for all functions m. B[E) that is exponentially-decaying as —)■ oo with rate r it is possible to 
find a function / G B{E) which is bonnded on the set T, bnt for which ||T(/)||x,oo is exponentially 
large with index 2t — 1. In particular, the best possible convergence rate for a robust method, i.e. 
one for which Q[E) < C for all G N, is root-exponential in N . Note that this theorem is very 
general: the method F can be linear or nonlinear, and F{f) only needs to be defined for extremely 
smooth (specihcally, analytic) functions. 

Now consider the cases of weighted minimization and least-sqnares fitting with Jacobi poly¬ 
nomials. By earlier argnments, the corresponding approximation errors behave like 0{{p')~'^) 
and respectively as N ^ oo. Moreover, by setting x = 0 in Theorems 17.11 and 

17.21 respectively, one deduces that 0(F) < I for the former and 0(F) < (A^/log(A^))^^"'“'^'''^^/^ for 
the latter. According to Theorem 17.41 least-squares fitting attains the best possible convergence 
rate for a robust method, while weighted minimization (with sufficiently large weights) attains 
nearly the best possible convergence rate, with only slow, algebraic growth of 0(F). 

Remark 7.5 Although Theorem 17.41 applies only to equispaced data, it can also be formulated 
for much more general data. Loosely speaking, similar conclusions apply unless the data clusters 
quadratically at the endpoints x = ±1 [8]. 

Remark 7.6 For general data, the constant in Theorem 17.11 implies mild ill-conditioning 

of weighted minimization as h —>■ O'*'. This is not seen in computations, and we expect it is also 
an artifact of the proof. Removing this factor is an open problem. 

7.1 Numerical examples 

In Figs. 0] and [5] we give a results for polynomial approximations from equispaced and jittered data. 
Althongh it has only been proved that weighted minimization performs as well (up to a log 
factor) as least-squares fitting, these results show that it in fact exhibits rather better performance, 
similar to that of the best possible least-squares fit. Note that this oracle least squares cannot be 
implemented in practice since M is calculated by minimizing the approximation error. 


7.2 Proofs 


The proof of Theorem 17.11 relies on the following three lemmas, which provide estimates for the 
quantities F2(/i, M), Foo(h, M) and F{h, M, R) respectively. 


Lemma 7.7. For a,P > —1 let be the orthonormal Jacobi polynomial basis 112. h)) . T = 

{tn}n=i E D be a set of scattered data points and suppose that h is as in \2.2\) . If hM"^ < 1 then 
E 2 {h,M) < y/hM, where E 2 {h,M) is as in ^5-4^. 


Proof. By self-adjointness. 


||FM-FMt/*t/FM|| = sup \{{Pm-PmU*UPm)x,x)\. (7.5) 

a:ePM(F(N)) 

Let X G Fm(^^(N)), IIxII = 1 be arbitrary and set g = £ I^M-i so that || 5||^2 = 1. 

Let {Vn}n=i be the Voronoi cells of the points {tn}^=i and set x{t) = '^n=i Then, by 
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n/ \ coshflOOa:^) 
~ cosh(lOO) 




f{x) = |x|3 


f{x) = sin(80x) 


Figure 4: Numerical comparison of weighted minimization and least-squares fitting for approximation 
from equispaced data using Legendre polynomials. The error against N is plotted for each method. The 
solid black line is weighted minimization with K = AN and weights Wi = i. The dashed lines are least 
squares with M = c'/N and c = 0.5,1.0,1.5, 2, 2.5, 3.0. The solid blue line is oracle least squares based on 
choosing M between 1 and N which minimizes the error ||/ — /||l= for a given N and /. Random noise of 
magnitude 10“® was added to the data. 



Figure 5: The same as Fig. U except for randomly jittered data. 


24 

































the definition (|2.3I) of the weights we have 


r2 


= Yln=i Tn\g{tn)\‘^, and therefore 


\{{Pm-PmU*UPm)x,x)\ = 


N 


1 - 


n=l 


|2 

Il 2 


(c,/3) 


(c-,/3) 


Hence 


< Wg-xh^ 




l2 + 115 - x|Il2 


and so it suffices to show that 


Il5-Xlli,2 < v^^II5|Il2 




(a,P) 


(7.6) 

(7.7) 


We have 


AT „ W 

||5-xlli2 ='^ \g{t)-g{tn)f u^^’^\t)dt = J 2 ln- 


71=1 


Let no be the unique number such that 0 € Wq- Then we write this as 


N 


riQ-l 


ll^ — Xlliz — ^ In + 7n + Inp — Sj + S^i + Sq. 


(7.8) 


n=no+l n=l 

We shall address each term separately. Consider ^i. Since < (1 — t)“ on [0,1], we have 


In < 


f l5(i)-5(in)|^(l-t)"dt, n = no,...,N. 

JVr, 


We now consider three cases: (i) — 1 < a < 0, (ii) a = 0 and (iii) a > 0. Consider case (i). Then 


In < 


'Vn 


>Vn 


|ff'(s)|ds 


JVn JVn JVr, 


dt 


By construction, Vn is of width at most 2h. Hence, after a short calculation we get that 

In<h [ |5'(t)|^(l - t)"+Mt, -1 < a < 0. 

JVn 


Now consider case (ii). We have 


In< dt 

JVn 


'Vn 


l5'(s)|ds 


<([ dX[ w{trdt<h^ [ wit^dt, 

\JVn / JVn JVn 


0 = 0 . 


Final, consider case (iii). By similar arguments, we get that 


In < 


([ (l-t)“ Al-s)-“-Msdd / \g'{t)\\l-tr+^dt. 

\JVn Jtn J JVn 


Write Vn = (a, b) where 0 < a < and tn < b < 1. Then 

rt 1 rb 


[ [ {l-s)-^-^dsdt=- [ (1-t)" ((1-t)-“ - (1-t„)-“) dt 

JVn Jtn ^ J a 

1 (l-6)“+i-(l-a)“+i 


1 

a 


{b- a) + 


a + 1 (1-tn)'^ 
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Note that 1 — b < 1 — tn and 1 — o > 1 — tn- Therefore 


[ {1-tr Ai 

J Vn tn 


ds dt < 


a 


b — a — 


b — a 


a + 1 


<h. 


Hence 


In < 


h f \g'dt, a > 0. 

Jv„ 


With these estimates to hand, we now deduce the following bound for the term Si in (|7.8I) : 


Si< 


^^Ils''lli2(0,i) “ = 0 

h\W\\l2 a/0 ’ 


(7.9) 


This follows from the fact that the Vn form a partition of (—1,1), the dehnition of no and the fact 
that (1 +> 1 for t G [0,1]. Identical arguments give a similar result for S-i: 


S-i 


< 

rv-/ 


■^^(o+l./3+l) 


(7.10) 


Finally, consider Sq. Let Vn^ = J-i U Ji, where Ji C [0,1] and J_i C [—1,0]. If hM‘^ < 1 we may 
assume that h < 1/2 so that |1 — t| > 1/2 and |1 +1| > 1/2 for t G 14,o- Then we have 


>^±1 < 




\g\t)\^dt<h^ [ 
Jj±i 


■JJ±i 


for any — 1 < 7,5 < 0. It now follows that J±i satisfy exactly the same bounds as (HJ]) and 
()7.10p for S±i. Therefore, in order to estimate the left-hand side of (17.8h it suffices from now on to 
consider only For this, we shall use Markov’s inequality: 

yWmi) < mV|/|||5||l27), yg G Pm, (7.11) 

where I is an arbitrary bounded interval, as well as the following Markov-type inequality: 


lb'I 


7(a + l,/3 + l) 


Mibllj 




yg G Pm- 


(7.12) 


Markov’s inequality (17.lip is well-known. A proof of ()7.12p is given in Appendix [Al 

There are now four cases: (a) a = /3 = 0, (b) a = 0, /3 7^ 0, (c) a 7^ 0, /3 = 0 and (d) a / 0, /3 7^ 0. 
Consider case (a). Then by (|7.8I) . (|7.9I) . (|7.10l) and (|7.11l) we hnd that H^f — x\\‘l 2 ^ ^^-^^Ibllia- 
Since /iM^ < 1, this now gives (17.71) for case (a). Now consider case (b). By (|7.8I) . (|7.9I) . (17.101) . 
(17.lip and (17.121) we get that 


Ib-Xlli2, <h^M^\ 


y(o,n 


1^2 (0,1) + 


|2 

Il 2 


< hM^\ 


( 0 , 13 ) 


|2 

Il 2 


( 0 , 13 ) 


Here in the final inequality we use the facts that < 1 and (1 -|- t)^ > 1 for t G [0,1]. Thus 

we get (|7.7I) in this case as well. Case (c) is near-identical to case (b). To complete the proof, we 
consider case (d). Using ()7.8p . (17.9p . (I7.10p and ()7.12p . we get 


which yields (17.7p . 


Ib-xlli. 


(C-./3) 


<hM^\ 


Ii2 


^(0,(3) 


□ 
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Lemma 7.8. For a,(3 > —1 let be the orthonormal Jacobi polynomial basis i2.6\) . T = 

{tn}n=i F D be a set of scattered data points and suppose that h is as in If hM"^ < 1 then 

E^{h,M)<hMHogM, 


where Eoo{h,M) is as in S- 

Proof. Let x € Pm(^^(N)), ||x||oo = 1 and set g = G IPm-i as in the previous proof. 

Then 


\\{Pm - PmU*UPm)x\\oc 


max 
2=1,...,M 



id-) 4*i)h 


(7.13) 


Observe that 


{9j4^i)L^ {9i4’i)h 


N 


X] / i9{t)4>i{t) - 9{tn)(t>i{tn)) dt 


N 


< 


^ / [ i9{s)4>i{s)y ds 

77 ,= ! 


71= 

N 


dt 


< 


^ / n^°‘’^\t)dt [ |(5((s)(()i(s))'|d'S- 

JVn 


Since ||x||oo = 1, we have |(g((s)(/>i(s))'| < Yl,j=i \i4'iis)4>j{s)y\ and we now substitute into (|7.13l) to 
deduce that 

M N . 

\\{Pm - PmU*U Pm)x\\oo < max Y] V] / v^°''^\t)dt I |((()i(s)(()j(s))'| ds 

JV^ 


M N M 

max 7 y In = max > (Si + S_i + Sn ), 

j=l n=l j=l 


(7.14) 


where, as in the previous lemma, Si corresponds to [0,1], S_i corresponds to [—1,0] and So 
corresponds to the term In^ where 0 G Vng ■ 

Consider the term Si. By (IA.2h and (lA.Sp . we have 

| 0 f^(t)| < min I {Vl - i 2 fc+a+l/ 2 | ^ 0 < t < 1 . 

Since 1 < i,j < M, we have that 

N f. „ 

Si < ^ / (l-^rdt/ min|(Vl-t2)-2«-2jvf,M2“+3| dt. 

n=no+l “'Bi JVn 

Let no + 1 < n* < A^ be arbitrary (its value will be chosen later) and split this sum into two 
according to n*. Then 


Si 


M Yi 

n=no+l' 




iV 

(l-t)"dt/ (\/l -t2)-2«-2di+ m 2"+3 V / (1-t)" 

JVn n=n*+l •I'Ti 


dt Idt 

Jv„ 


= MSf + m2"+3s+. 


(7.15) 
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(7.16) 


We consider Sf^ separately. For Sf, we have 

s+<h [\i-trdt<h{i-z*r^\ 

J z* 

where z* is the right endpoint of Vn*- Now consider S^'. 

n* .. n* .. 

~ ^ yZ / (1 - sup(l - = h V / (1 - t)"(l - 

, 1 JVn teVn /y^ - rr~t I 1 


7T,=ri.o+l 


n=no+l 


where is the right endpoint of Vn- However, if t £ W is arbitrary then Zn < t + h. Thus 
(1 — Zn)~°'~^ < (1 — t — and this gives 


Si < h f (1 — t)"(l — t — h) " ^ dt </i / s ^{1 — h/s) “ ^ 
Jo Jl-z* 

Suppose now that n* is chosen so that 

1 - 2/M2 -2h<z*<l- 2/M^ 


ds 


(7.17) 


(recall that the Voronoi cells are of width at most 2h, hence such a choice is possible). Then, since 
hM"^ < 1, we find that 1 — /i/s>lforl — 2 :*<s<l. Therefore, if (|7.17l) holds we have 


Si <h I s Ms < hlog(l - z*) </ilog(M). 


(7.18) 


' 1 - 2 * 


Combining this with (I7.16P and (I7.17p . and using the fact that < 1 once more, we now get 
that Sf < h{M~‘^ + /i)"+^ < Substituting this and (I7.18h back into (I7.15p now gives 


M M ^ hMlog{M) + hM< hMlog(M), 


(7.19) 


which completes the estimate for S'!. The estimate for S-i is near-identical, except that we use 
(1A.9P as well as (jA.Sp since Si sums over integrals contained in the negative portion of the interval. 
Hence we get 

S.i<hMlog{M), (7.20) 

for this term as well. Next we need to estimate 


So = f f |((/>i(s)(/>j(s))'| ds 

^nr\ Vnn 


Since Wo ^ [~2h,2/i], we have that < 1 for t £ Wq- Also, by (jA.Sp and ()A.9p . we have 

|((()j(s)(()j(s))^| < M, s £ Wo- Hence we get Sq < hM. Combining this with (|7.19p and (I7.20h and 
substituting into (I7.14p now gives 


M 

\\{Pm- PmU*UPm)x\\oo <Y,hM\ogM = hM^ logM, 


from which the result follows immediately. 


□ 
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Lemma 7.9. For a,(3 > —1 let be the orthonormal Jacobi polynomial basis i2.6\) . T = 

{tn}n=i F D be a set of N scattered data points and suppose that h is as in 112.^) . Suppose that 
hM"^ < 1. If the weights Wi > where q is as in ^7.1^ , then the quantity F{h, M, R) defined 

by id- satisfies 

F{h,M,R) <VMsup\i'J+^/^/wi}, (7.21) 

i>R ^ J 


Moreover, if the weights Wi > i log i and R> M then 


F{h,M,R) < /iMsup 


ilogi 


i>R L 

Proof. As before, let x G PAf(£^(N)), ||x||oo = 1 and set g = ^ ^M-i- Then 


(7.22) 


\PfiW-^U*UPMx\\oo = sup 

i>R 


(ff) fii)h 


Wi 


< IblUsup 

i>R L 


i\\h 


Note that \\4>i\\h < ^ by (12.71) and also 

||5||^ = {PmU*UPmx,x) < (1 + \\Pm - PmU*UPm\\) ||xf < M, 

where in the final inequality we use Lemma 17.71 and the fact that ||a;|| < \/M||x Iloo = VM. This 
now gives (j7.21j) . 

For (I7.22P we use orthogonality and the fact that R> M to get 


iPfiW-WUPMxWoo = sup 

i>R 


Wi 


-{qA: 


i/h 


= sup I — I {g, - {g, 11 , (7.23) 

i>R[Wi' •' ' J 


We now proceed in a similar manner to the proof of Lemma 17.81 First, since ||x||oo = 1 we have 

M N . „ 

{t) dt |((/>,(s)())j(s))'|d'S- 

J=1 n=l Jv„ 

We now argue in a similar way, using the fact that j < M < R < i. This gives 

M 

\{9,4>i)Ll - {g,4>i)h\ < '^hilog{i) < Mhilog{i). 

i=i 

Substituting back into (I7.23P now gives the required result. 

We are now ready to prove Theorem 17.11 

Proof of Theorem \7.1\ We use Theorem 15.61 and the estimates of Lemmas I7.7H7.9I Note that 

min |rcj|, max |u;i| X mT'+'^+i/2 M ^ oo. 

M<i<R^ ^ 

Hence, we require h, M and R> M such that 

E{h, M) < e, E{h, R) < e, E{h, M, R) < 

Since R> M, Lemmas rm and EH give that the first two conditions are satished provided 

h < ^ 

~ R'^logR' 


□ 
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We now consider F(h, M, R). Suppose first that 0 < 7 < 1/2 — q. Then Lemma 17.91 gives that 
F{N,M,R) < provided R > . Hence this results in the condition 


M2+2(<?+1)/7 log M ’ 


which gives the result for 0 < 7 < 1/2 — g. Now suppose 7 > 1/2 — g. Then Wi > > ilogi 

and therefore Lemma 17.91 gives F{N,M,R) < whenever 


h 


logi? 

_R7+<?-1/2 


< eM-3/2-7-9. 


Suppose that R = cM for some c. Then the above condition holds, provided 


~ Af 2 log M ■ 

Moreover, substituting R = cM into (|7.24p gives precisely the same condition on h in terms of M. 
Hence the result follows. □ 


The proof of Theorem 17.21 is straightforward: 


Proof of Theorem \7.2 


We use Theorem 15.71 in combination with Lemma 17.71 


Finally, we also give the proof of Theorem 16.51 


□ 


Proof of Theorem \6.5[ We shall use Lemma lOl Let y £ C^, ||y|| = 1 be given. Let x £ 1) 

be a smooth compactly-supported function in (— 1 , 1 ) with the properties 


11x11 = 1 , x(0) = l, 0 <x(x) < 1 , x£ (- 1 , 1 ). 


Define 

N / \ 

g{t) = gn{t) = x\ . , 

where = dist(t,i, 5W,) is the distance of the point tn from the boundary of its Voronoi cell (in 
this one-dimensional setting, Vn is an interval and its boundary is the set of the two endpoints). 
Observe that ^ 

fn — 2 Tnin tjj, tji tfi—i} , n — 1,..., N, 

and therefore f,n > f for each n = 1,... , A^. By construction supp(( 7 ri) C Vn, n = 1,... , N, and 
therefore supp(g(n) nsupp(( 7 m) = 0, n / m. It follows that g £ Gy. Since g £ 1,1] a standard 

result in polynomial approximation gives that 

inf \\g-cP\\oo<CrK-^\\g^^^\\oo, K > r, 


for some constant Cr > 0 independent of K and g (see [lOp (5.4.16)]). Observe that 


||fi((^y|oo= niax 


IVn 


n=i,...,N Dr, 


ir^||oo<||y|| max 


iCn 


n=l,...,Ar Dr, 
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where in the last inequality we use that fact that ^ Tn- Hence 


inf lb - 0 IIOO < K>r. (7.25) 

ct>£'S>K 


In order to apply 
we have 


Ibl 


2 


(EU), it remains to estimate |b|b = IbH- Since the g^s have disjoint supports, 


N 


= E 

n=l 




min {Cn/Tn} > Cn/ niax r^. 
n=l,...,N n=l,...,N 


Observe that t„ < 2h. Therefore, IbH > ^^^/{2h). Substituting this and (17.251) into (16.11) now gives 
the result. □ 


Finally, we now note that Theorem 11.11 follows immediately from Theorems 16.51 and 17.11 


8 Trigonometric polynomials on bounded intervals 


We now consider Example 12.21 Note that in this case we define the projections '■ l^{^) —>■ 

by Pnx = {..., 0,0, x_Ar, x_Ar+i,..., X 7 V- 1 ,0,0,...}. Our main result is as follows: 

Theorem 8.1. Let {(^j}*ez be the Fourier basis /i2.8\) . T = {tn}n=i ^ [“1; 1] be a set of N scattered 
data points and suppose that h is as in i2.2\) . Suppose that the weights Wi = l + |*b for some 7 > 0. 
Then for each 0 < e < 1/2 there exists a c(e) > 0 such that if 


h < c(e) 


^_ 3 / 2 - 3 /( 47 ) 

M-3/2 


0 < 7 < 1 
7 > 1 


then any minimizer x of i3.6\) satisfies 


for some constant C depending on e only, where TN,K,r){x) is as in (|5. Tj ). 


For least-squares Htting, we have the following: 


Theorem 8.2. Let {4>i}i^z be the Fourier basis 112.8\) . T = {tn}n=i ^ [“Ij 1] be a set of N scattered 
data points and let h be as in f2.2\} . Then for each 0 < e < 1 there exists a c(e) > 0 such that if 


h < c{e)M 


( 8 . 1 ) 


then the solution x of L3. exists uniquely and satisfies 


\\x — x|| < 

for any w = {tCjjieN with Wi>l. 


^1 + ~ Pmx\\i,w + 


VT^e 


Unlike the case of Jacobi polynomials, these results give a worse recovery guarantee for weighted 
minimization than that of least-squares htting. However, we do not believe the scaling h < 
is sharp, and instead we conjecture that the true scaling is h < (MlogM)“^. Proving this 
conjecture is an open problem. We remark in passing that this conjecture holds in the special case 
where the data is equispaced (we omit the proof for brevity’s sake). 
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Figure 6: Numerical comparison of weighted minimization and least-squares fitting for approximation 
from jittered data. The error against N is plotted for each method. The solid black line is weighted 
minimization with K = AN and weights Wi = y/l. The dashed lines are least squares with M = cN and 
c = The solid blue line is oracle least squares based on choosing M to minimize the error for 

a given N and /. Random noise of magnitude 10“® was added to the data. 


8.1 Numerical examples 

In Fig. [6] we give a comparison of the two techniques for jittered data. In alignment with the 
discussion above, these results suggest the scaling predicted by Theorem 18.11 is not optimal. In 
fact, weighted minimization performs better than least-squares fitting (including the oracle case) 
in all the examples. We suspect this strong performance is due in part to the presence of some 
sparsity in the functions considered, and the fact that jittered points are near-optimal points for 
the recovery of sparse trigonometric polynomials m- 


8.2 Proofs 


We first require the following lemma; 


Lemma 8.3. Let {4>i}i& be the orthonormal Fourier basis \2.8\) . T = Q D he a set of N 

scattered data points and suppose that h is as in 112.B) . Suppose that hM < 1. If E 2 {h,M) and 
Eoo{h,M) are as in hS.fl ) and hS.fl ) respectively, then 


E2{h, M) < hM, E^{h, M) < hM^/^. 


Proof. Consider E 2 {h,M) first. As in the proof of Lemma [7.71 let x £ ||a^|| = 1 be 

arbitrary and set g = ^j4’j so that Ilfl'Hia = 1. Arguing in an identical manner, we see that 

it suffices to show that 


N 

E 

n=l ' 




\9{t) -g{tn)\'^dt < 


( 8 . 2 ) 


Observe that \g{t) - g{tn)\^ < h | 5 '(s)pds and therefore Yln=i Ivn - 9 {tn)\^ dt < h'^Wg'Wl^- 
To get ()8.2p we recall Bernstein’s inequality 11(7^11^2 < M|| 5 ||j ;^2 for trigonometric polynomials. 

For Eao{h,M) we let x G Pm{(-^{'^)) with ||x||oo = 1 and dehned g as before. As in the proof of 
Lemma 17.81 it suffices to estimate 


.^_max^_J(5,(()i)i2 - {g,(t)i)h\- 

Arguing in the standard way, we see that 

max \{g,(i)i)L 2 - {g,(t)i)h\ < h\\{9(t)i)’\\L^ < ^ll(5'</>i)'llL2 
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Since gcjii is a trigonometric polynomial of degree at most 2M Bernstein’s inequality gives 
• A{9Ai)L2 - {9,4>i)h\ <hM\\x \\2 

where in the final inequality we use the Cauchy-Schwarz inequality the fact that ||x||oo = 1- This 
now gives the estimate for Eoo{h,M). □ 

Lemma 8.4. Let {0i}iez be the orthonormal Fourier basis m),T = C D he a set of N 

scattered data points and suppose that h is as in 112.2\) . Suppose that hM < 1. Then the quantity 
F{h, M, R) defined by L5. 5|) satisfies 

F{h, M, R) < a/m sup {l/tCi} . 

i>R 


Moreover, if the weights Wi > i and R > M, then 

F{h, M, R) < hVM sup {i/wi} . 

i>R 

Proof. We argue as in the proof of Lemma [7.91 In the first case, since the functions fii are uniformly 
bounded we have \\P^W~^U*UPmx\\oo < H^Hh supj>^{l/rci}, where ^ Halloo = 

1. By the same argument, we find that H^H/i < WgWi'^ < a/M, which gives the first result. 

Now consider the second. With x and g as before, we have 

\\P^W~^U*UPmx\\oo = sup \ {g,(j)i)L 2 - {g,4ii)h 

i>R 

As in the previous lemma, we note that \{g,<f>i)i2 — {g, 4 >i)h\ < h\\{g(f>iy\\i2, and therefore by Bern¬ 
stein’s inequality \{g, 4 >i )]^2 — {g,(j)i)h\ ^ ^*||ic ||2 < hi'/M. This gives the second result. □ 

Proof of Theorem \8.1\ With this lemma in hand, the proof is identical in manner to that of Theorem 
17.11 We omit the details. □ 

9 Conclusions 

We have presented an infinite-dimensional framework for weighted minimization. Its advantages 
are that it does not require a priori knowledge of the expansion tail in order to be implemented 
and in the absence of noise it leads to interpolatory approximations. We have discussed the role 
weights play in the minimization in resolving the aliasing phenomenon, as opposed to promoting 
smoothness, and provided an explicit way to choose the truncation parameter. In the second half 
this paper we performed a linear error analysis for this framework valid for arbitrary scattered data, 
and used it to show near-optimal performance for Jacobi polynomial bases. 

There are several topics for future research. Three immediate problems are (i) to obtain a 
better scaling than (IS.ljl in the trigonometric polynomial case, (ii) to estimate the truncation error 
Th^K,r){x) in a way that does not require additional regularity of x (see Theorem 16.11 and Remark 
[QD . and (iii) to improve the noise bound in Theorem l7.1l (see Remark l7.6l) . Besides these, a question 
of singular importance are the extensions of ^ to higher dimensions and to unbounded intervals 
(using Laguerre and Hermite polynomials, for example). Other higher-dimensional problems can 
also be investigated, such as approximations in spherical harmonics (see also |28j). 

Another topic is the optimal selection of weights. The results of this paper suggest that weights 
aid approximations from deterministic, scattered data by resolving the aliasing phenomenon and 
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not necessarily by matching the decay of the expansion coefficients. In particular, slowly growing 
weights seem sufficient, at least in the one-dimensional setting. The situation may be different 
however in the multidimensional case when the samples are random. It has recently been shown in 
mm that weighted minimization with a specific choice of weights leads to optimal approximation 
rates for certain classes of multivariate functions when the samples are drawn randomly from the 
orthogonality measure of the polynomial basis. We also note the possibility of using reweighted 
minimization, where weights iteratively updated to get a better estimation of the support set of x 
[25t [ 35 ] . We expect this technique can be combined with our framework. 
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A Jacobi polynomials 

Given a, (3 > —1 let be the Jacobi polynomial of degree j. These polynomials are orthogonal 

on D = (—1,1) with respect to = (1 — t)"(l +1)^, with 

j^py^^p) T3y^^p)\ . —A..,.-("A) 

where 


3(“A) p(«A) 
j ’ 


)r2 — 3 




2 a+ 0 +i r(j + « + i)r(j + /? + 1 ) 
2j + a + l3 + l j!r(j J-a J-/? J-1) 


(A.l) 


and have the normalization 




j + a 

j 


1 y/2 

The corresponding orthonormal polynomials are defined by Pj-i\i), j G U- 

Note that 

(A.2) 




K\ " ' ~ J ^ 00. 


and also that 




r(a + 1) 

The polynomials satisfy the differential equation 




(A.3) 


where = j{j + a + (3 + 1). In particular, the derivatives are orthogonal with respect 

to and satisfy 


P 




A(“A)^(«>/3) 




0+1) 




(A.4) 
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Lemma A.l. Let a,l3 > —1. Then ||p'||j ;^2 < ||p||j ;^2 , Vp G Pm- 

^(q + 1,/3 + 1) u{a,P) 

Proof. Let p G Pm be arbitrary and observe that 

M 




j=0 




Note that 


Similarly, 


\\pfr^ 


M 

E 


p'it) = ^ 




M-l 

_El_p(“+h/3+l) v =(v 

(o+l,/3+l)^i 'b’-'j' iL^ 

j=0 


(a+l./3+l) ' 


and 


IVj 


^l{a + l,/3 + l) ("+l:/3 + l) 

j=0 I^j 


(A.5) 


(A.6) 


Consider Xj. By the differential equation (IA.3p and the fact that fPl) = 0, we have 


Hence, by (jA.4p . 


Xt = 


(«./3) 

fh 


_ (' p{“+ii/3+i)\ 

^ _^(a,/ 3 )^(a+l,/ 3 +l) > j-l 


^-1 




«A) 




(“./3) («+l,/3+l) 
4 '^3-1 


Vj-i- 


Using (IA.5h and (IA.6I) we now get that 

M 


Il 2 


lyj-i 


(a,/3) ~ \ ("i/5)^(“+li/3+l) “ \(“A) -^^(q,+i,/3+i) ’ 

i=l ■^7 


as required. 


□ 


We also require several results concerning the asymptotic behaviour of Jacobi polynomials. The 
first is as follows (see [3ll Thm. 7.32.1]): 

\\l°° = O {j‘^) , n —oo, <7 = max{Q;, /3, —1/2}. 

Hence, using (IA.2p we find that the normalized functions 4>j defined by (|2.6p satisfy 


Pj\\L°° 


= o 


i 


oo, 


(A.7) 


which gives (j2.7p . We also note the following local estimates for Jacobi polynomials. If A: = 0,1, 2,. 
and c > 0 is a fixed constant then 


dfcp(“./3)(^) 




t=cos 9 


_ J 6I-“-^-i/ 20 (j^-i/2) cj-1 <e<-Kl2 

Q (j 2 fc+a^ 0 < 0 < cj -1 


(A.8) 
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as j oo. See m Thm. 7.32.4]. This estimate bounds the Jacobi polynomial and its derivatives 
for 0 < t < 1. For negative t, we may use the relation 

(A.9) 

Hence behaviour of and its derivatives for t < 0 is given by (lA.Sh with a replaced by 13. 
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